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Most optimization problems in applied sciences realistically involve uncertainty in the parameters 
defining the cost function, of which only statistical information is known beforehand. Here we 
provide an in-depth discussion of how message passing algorithms for stochastic optimization based 

on the cavity method of statistical physics can be constructed. We focus on two basic problems, 
namely the independent set problem and the matching problem, for which we display the the general 
method and caveats for the case of the so called two-stage problem with independently distributed 
stochastic parameters. We compare the results with some greedy algorithms and briefly discuss the 
extension to more complicated stochastic multi-stage problems. 
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I. INTRODUCTION 

Most real-world optimization problems involve micertainty: the precise value of some of the parameters defining 
the cost function is often unknown, either because they are measured with insufficient accuracy, or because they are 
stochastic in nature and revealed only after part of the decisions have been taken. The purpose of the optimization 
process is thus to find solutions which are optimal in some probabilistic sense, a fact which introduces fundamental 
conceptual and computational challenges [1]. 

Examples of stochastic optimization problems can be found in all areas of applied and natural sciences, ranging 
from resource allocation and robust design problems in economics and engineering, to problems in chemistry, physics 
and biology. For instance, resilience of biological systems with respect to unpredictable environmental conditions can 
be seen as a stochastic optimization feature selected by evolution, both at the molecular and systems levels. 

Optimization under uncertainty, or Stochastic Optimization, is an ample and well established field of research which 
tries to generalize the optimization methods used in Operations Research and computer science to a probabilistic 
setting. The typical framework considered is Two-Stage Stochastic Optimization (TSSO), in which some of the 
variables have to be assigned before the stochastic parameters are specified, and the remaining variables are assigned 
after. 

TSSO poses some very tough computational challenges: typically, the size of the uncertainty space is huge (it 
increases exponentially with the size of the instance), and the underlying problem can be computationally hard 
due to discrete nature of the variables involved (typically decision variables). In order to cope with these difficulties, 
traditional approaches rely on sampling (e.g. considering "scenarios") and on the relaxation of the integer constraints. 
For example. Stochastic Programming is the extension of Linear Programming and sampling techniques to uncertain 
scenarios [2H4j. In fact, the presence of uncertainty has a deep impact on the computational complexity of a problem: 
stochastic optimization problems often belong to a superset of the NP complexity class called PSPACE [T], and many 
problems which are easy to solve when their inputs are known exactly, become intractable as soon as some form of 
uncertainty is introduced [S]. 

Statistical mechanics has played an important role in the past in the design of large scale optimization algorithms. 
Partly this was made possible by extending ideas from the statistical physics of disordered systems to applications 
in computer science. Examples range from Monte Carlo sampling and simulated annealing OEI, to the more recent 
advances in message-passing algorithms [8Hl7]. Monte Carlo can in principle be used to solve TSSO problems as 
well. What is needed is to approximate the expectation of a cost function by sampling from the space of stochastic 
parameters. Then a second Monte Carlo scheme (e.g. simulated annealing) is employed to find an optimal solution 
to the estimated cost function. This is normally a very heavy computation even for moderately large problem sizes, 
especially when stochastic parameters are not concentrated and the estimated cost function has a complex behavior. 

In Ref. [IB] we introduced an approach to TSSO which resembles Survey Propagation (SP) P^21j . combining 
Belief Propagation (BP) and its /? = oo version, also known as Max Sum (MS, see e.g. [22 for an extensive review on 
these two methods). The approach is partly analytic and it allows to build an algorithm to optimize the expectation 
of a stochastic cost function by estimating the statistics of its minima, without resorting to explicit (and costly) 
sampling. The method was applied to a stochastic version of the matching problem with independently distributed 
stochastic parameters and it resulted in a distributed message passing algorithm dealing with this TSSO problem. 
The algorithm was shown to perform very well in a stochastic bipartite matching problem by extensive numerical 
simulation. 

In this work we discuss the details and the generalization of the method and of the resulting algorithm. Beside the 
application to the stochastic bipartite matching problem introduced in (18j , we generalize the technique to deal with a 
relevant TSSO version of the Maximum Independent Set problem, which consists in finding the maximum independent 
set in a graph, when the node's contribution to the total weight is uncertain but its distribution is known. This is a 
problem that could arise in a communication network with some interference constraints [23i. Consider a network of 
devices communicating with a central server with the following constraint: two neighboring devices in this interfrence 
network can not transmit information at the same time because the data may be lost. Therefore, if a set of devices 
transmit at the same time, they need to be an independent set of the network to have a successful transmission. In 
addition, suppose that the devices have different transmission rates and the server wants to choose an independent 
set of the network with maximum transmission rate. This is a maximum weight independent set problem |24j . If 
there is a sort of uncertainty in the problem, for example in the transmission rates, we have to deal with a stochastic 
optimization problem. 

The two-stage problem can easily be generalized to a multi-stage problem where in each stage some of the stochastic 
parameters are revealed and one has to assign a subset of the variables. This is a more difficult and less studied problem 
in the field. The method presented here can in principle be extended to study stochastic multi-stage problems. This 
generality comes at a cost: every successive stage involves dealing with increasingly complex distributions. In this 
paper we also consider an heuristics obtained by reducing a multi-stage problem to a sequence of two-stage problems 
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that seemingly gives a very good approximation in this particular case. 

The paper has the following structure: we define the problem in section [llj in section III we give the general cavity 
approach to solve a two-stage stochastic problem; in section |IV| wc apply the method to the stochastic independent 
set problem; finally, in section [V| we apply it to the stochastic matching problem. 



II. PROBLEM DEFINITION 



In general terms, the problem we study is defined by an energy function £(xi,t2,X2) depending on two sets of 
decision variables xi,X2 and a set of independent stochastic parameters t2. The objective is to optimize the average 
outcome of the following process: first xi is chosen, then t2 is extracted, and then X2 is (optimally) chosen. 

That is, the first step consists in fixing xi such that the following average energy is minimized: 

x^ = argminEt, min£(xi,t2,X2). (1) 

XI ' X2 



In certain cases, a greedy algorithm may solve the above problem by replacing the second stage stochastic parameters 
with their expected values, that is 



^gicc y _ a,rgminmin£(xi, (t2) ,X2). (2) 



greedy 

Once the stochastic parameters t2 are extracted, it then solves for the second stage variables, given xf'^'"''^'*', t2. This 
is a simple but very naive algorithm to solve a two/multi-stage stochastic optimization problem. On the other hand, 
a clear lower bound for the optimal energy is obtained when all stochastic parameters are known at the beginning. 
We call this the offline solution x°®™" and it is computed by minimizing the whole energy. In the following, we shall 
always compare the results with the above greedy and offline solutions. 



III. CAVITY APPROACH: PASSING SURVEY OF SURVEYS 

Generally speaking, the method we propose consists in computing the chain of operations in ([T]) by performing the 
minimizations with the help of MS and the expectation with BP. The scheme we propose consists loosely in following 
the procedure below: 

1. Call ljv/s(ni, t2) the indicator function for the MS equations for the inner minimum in ([T]) as a function of MS 
messages m, and build the following distribution Q(t2,m) oc P{t2)lMs{^,^2)- Separately, compute the MS 
expression for the minimum energy £*(xi,t2;m) on a MS fixed point m. This will be needed in itcmjsj 

2. Obtain the BP equations for Q, with message vector Q. These equations can be considered as SP equations 

3. Treat the expression for the minimum energy £*(xi,t2;m) from the MS of the first step as an observable and 
compute an expression f *(xi; Q) for its average (£*(xi,t2; ni))Q as a function of the BP messages. Up to here, 
the variables xi have been considered constant. 

4. Finally, employ MS again to find the minimum of £*(xi;Q) over both Q and Xi, where the messages are 
constrained by the BP equations. 

We will now explain more in detail how this is done. Consider a system of interacting variables V — {i\i — 
1, . . . , N} = Vi U V2 with interaction set E — {a\a — 1, . . . , M}. Suppose that the energy function is 

f (xi,x2;t2) ^^ea{xoa;ta) + ^ Ciixf, t^) , (3) 

a i 

with variables {xi} and local energies {e^, Ca}- The stochastic parameters {ti, ta} are independent and obey a product 
distribution P{t) — Y[iPii^i)Y[aP'>-it'>')- Here da denotes the set of variables contributing in the energy function e^. 
Similarly, we use di for the set of interactions depending on Xi. For fixed xi and t2, the statistical physics of the X2 
variables at finite temperature T2 = 1//32, is given by the following partition function: 



^2[xi;t2] =^e-'^^^("-''='*^). 

X2 



(4) 



4 



^e-fee.(..;t.) -Q ^^^^(^^)^ (7) 



In the Bethe approximation we write the corresponding free energy as 

i^2[xi;t2] =^AF, + ^AF,- ^ d.F.a. (5) 

a i (ia),i£V2 

The local free energy changes are computed from the cavity marginals ipi^a{xi) and ipa^iixi) 

g-fcAi=^. ^ ^g-/32ei(a;i;ti) 
g-/3.AF.„ = (8) 

satisfying the BP equations 

(X ^ e-^^^-^^^-*") J] Vj^a(Xj) = V^a^,. (10) 
xaa\i jeda\i 

Notice that variables in the first set are fixed, so for these variables ipi^aix) = Sx,xi and AFi = ei{xi; ti). The right- 
hand side of should be understood as a definition of the functions '!/'i->a({^6-i.i, i> € 9i\a}, ti) and ^a^i{{'4'j'^a, j € 
da\i},ta). 

The Bethe approximation to the free energy is asymptotically correct as long as the interaction graph is locally 
tree-like and we are in a replica symmetric phase. For the sake of simplicity, we will assume that this is the case 
and that the BP equations have a unique fixed point. Obviously, when this is not true, assuming replica symmetry 
breaking and employing the correspondent RSB equations would give a more accurate treatment of the system. 

To get the minimum energy configuration we need to take the limit /32 — oo. Let us assume that in this limit the 
BP messages scale as e^^™'^"(^'' = ipi^aixi) and ^P^ma^iixi) _ ^^^.(^Xi), defining new cavity messages nii^a and 
vria^i- Starting from the BP equations one can easily derive the so called Max Simr equations 

mi^a{xi) ^ -ei{xi]ti) + ^ mb^i{xi) = rhi^aj (11) 

nia^iiXi) = max < -ea{xaa;ta) + ^ nij^aixj) > = TOa^i- (12) 
[ jeda\i J 

Again for the variables in the first set 

rrii^a = '^ogS{xi;-), ieVi. (13) 



where log(O) — — oo. To fix a second stage variable we need the local MS messages computed as in (11), but 
including all the neighbors of i in the sum. With our definition of MS messages, Xi = argmaxmi(a;). 

Notice that still the messages depend on the stochastic parameters t2 . As before we assume that for each realization 
of t2 the MS equations have only one fixed point. The statistics of the MS messages among different realizations is 
given by the joint probability distributions Qi^a{fni^q',ti) and Qa^i{ma~^i',ta) satisfying the following equations 

Qi^a{mi~^a;ti) (X pi(ti) ^ Yl Qb^i{mb^i;ti,)S{mi^a - mi^a), (14) 

)'XPaita) ^ Y{ Qj^aimj^a;tj)S{ma^i~ma^t)- (15) 

{tj ,nij^a \j^Oa\i} j^da\i 

Then, the marginals over the MS messages are simply obtained by summing over the stochastic variables 

Pi^airrii^a) = ^ Qi^a{mi^a;ti) = Pi^a, (16) 
ti 

Pa^i{ma^i) = ^Qa^i{ma^i;ta) = Pa^i- (17) 
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Clearly for fixed first-set variables we have 

Pi^ainii^a) ^ S{mi^a -log S{xi;-)), ieVi. (18) 

We will refer to the above equations as the BP-MS equations. The average energy can be computed using the Bethe 
free energy, that is 

£:i(xi) = ^(AO+^(Ae,)- (19) 

a i {ia)AeV2 

where the average of Acq = limp^^ac ^Fa, Ae^ — liin^2_j.oo Ai^i and Ae^a — lim;32->oo A.F'ia are taken over the 
stochastic variables t2. 

We should mention here that when there exist many Max Sum fixed points the above average energy is computed 
with a uniform measure over the fixed points. Suppose we have A/ta Max Sum fixed points for given Xi,t2. For 
any fixed point mt^ , consider the Bethe minimum energy £i (mt^ ) . The average energy computed by the surveys 
Pi_>a(TOi_).a) and Pa^i{ma^i) IS indeed 



t2 \ *2 / 



(20) 



It should be noted that, in the case of multiple fixed points, this expression is different the one that would have been 
obtained with the loose procedure described at the beginning of this section. Indeed that would have resulted in 

which is the expression corresponding to item |3] of the description. In particular, £" may not coincide with £i even 
in the case in which all the fixed points have the same energy. Besides being less informative for our purposes, the 
computation of £" is more involved than the one of £[, needing the propagation of joint messages Pi^a{mi^a, ma^i) 
and Pa^i{ma^i,'m'i_^a)- In the case of a single fixed point, such messages simplify as they depend only on the 
argument of the forward direction. 

Now we are ready to deal with the first stage variables. The partition function for this subsystem at finite temper- 
ature Ti — reads 

Zi = ^e-^i^i(''i^ (22) 

xi 

We recall that the MS messages needed to compute £i(xi) depend implicitly on xi. In order to make the average 
energy a local function, we introduce the Pi^aifn-i^a) and Pa^i(ma~^i) as new variables in the partition function 

xi,{Pi^a:Pa^i} ieV2 a£di a i£da 

As before, the marginals of Pi^a and Pa^i can be computed by the Bethe approximation. Let us first write the cavity 
messages related to the second set variables: 

^^^a{P^^a)0, ^ ^-^(Ae,^^ v]/,^, (p,^^)^ (p^^„ _ p^^„) , (24) 

{Pb^z\bedi\a} bedi\a 

^a^^{Pa^^)0C ^ ^-PiiAe.^.) Vj/^-^, (P,-^„)5(P,^, - F,^,) , (25) 

{Pj^^\]eda\i} jeda\i 

where (Aei^a) and (Aca^i) are the average of cavity energy shifts, 

Ae,^a= lim AF.^a, AF.^a = AF, - AF,a, (26) 

;32-i-oo 

Aea^^^ lim AFa^^, AF,^,; = AF, - AF,,. (27) 

/32->-oo 
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Note that the energy term Aei_i.a (resp. Aea->.i) in (24 1 does not depend on the backward message Pa-j-i (resp 



Pi^a)- This is exactly the reason for the regrouping of the energy terms in (261, and it is crucial in order to avoid 
correlations between messages traveling in opposite directions. The cavity messages for the variables in the first set 
are a bit different from the above equations, due to the asymmetric form of the average energy, 

v&.^alx,) cx e-^^^-(--*-) n (28) 

^a^^{x^)o, ^ ^'^^^^'^^^ J] "^J^aiXj) J] *J^a(P,-^a). (29) 

{xj\j<£da\i,Vi},{Pj^^\jeda\iy2} jeda\i,Vi jeda\i 

These finite temperature equations (BP-BP-MS) could already be used to extract useful information about the 
phase space of the problem. However, in order to find the configuration minimizing the average energy, we have 
to take the zero temperature limit /3i — ;> oo. Again we work with the following scaling: ^'^-j.a = g/SiAfi^a g^j-^^j 
^a^i = e'^i*^"^'. In this way we obtain the Max Sum equations (MS-BP-MS) in the top layer 

M,^a{xi)^-ei{xf,ti)+ Mb^,{x,) ieVi, (30) 

Mi^a{Pi^a) — max ^ < )+ V Afb^,(Pfc^,)> i&y2, (31) 



{Pb^^\bG^^\a}■.Pa^,=Pa 

and 

Ma^,{xi) ^ max { -(Ae^) + V Afj-^„(a 



b^di\a 



J2 M^^aiPj^a)} leVu (32) 

jeida\i,V2 



Ma^,{Pa^i) = max { -(Ae^^,) + V M,^,(a;,) -f V Mj^,(P,^,) \ i e V2. 

{P,^^\3€daV,V2}:P.^a=P.^. ^ je^aV.^i jGdaVy^ ) 

(33) 

The above messages should be normalized by subtracting the maximum value of the unnormalized message in each 
case. Starting from random initial messages {Mi^a, Ma-^i} we update them according to the above equations. At the 
fixed point the local messages Mi determine the solution to the first stage variables. Introducing a small reinforcement 
to the equations would help the algorithm converge more easily to a polarized solution j28) . To this end we modify a 
bit the equations for the first set variables as 

M,^a{x^) = -e,(x,; U) + ^ Mt^^ix,) + pM,{x^), (34) 

b^di\a 

M,{x,) = -e,{xi-U) + ^ Mb^,{x,) + pMi(xi), (35) 

where p > is the reinforcement parameter. 

In the next sections we will make the above points more clear by studying two problems: a stochastic independent 
set problem and a stochastic matching problem. 



IV. THE TWO-STAGE STOCHASTIC INDEPENDENT SET PROBLEM 



We consider a weighted graph G — (V, E) with node set = Vi U V2 of size N ^ edge set E and weights Wi on the 
nodes i = 1, . . . , Af. A configuration of nodes x € {0, 1}^ defines an independent set if XiXj = for any edge in E. 
The weight of an independent set is the weight of nodes belonging to the set, i.e. W = '^^XiWi, and a maximum 
independent set has the maximum weight among all the independent sets. A stochastic version of this problem is 
obtained by introducing independent stochastic parameters ti G {0, 1} representing the nodes that contribute to the 
total weight. Given the probability distributions {pi(ti)\i € V2} we are to find an independent set with maximum 
weight W = '^iUxiWi, after realizing t2. In terms of the previous section notations: ei{xi]tj) = —tiXiWi and 
Ca = eij{xi,Xj) = SxiXj,i X 00. That is we have deterministic hard interactions and no stochastic parameters ta- 
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In the following wc shall work with Erdos-Rcnyi (ER) random graphs. The subsets Vi and V2 are chosen randomly: 
a node can belong to the first or the second subset with equal probability. For the sake of simplicity we shall assume 
that all the node weights are the same, say Wi = 1 for any i. The probability distributions Pi{ti) = Pj^t^,! + (1 —Pi)Stifi 
define the amount of uncertainty in the problem. When all Pi = ot Pi = 1, there is no uncertainty and we recover 
the deterministic problem. In the other extreme we have all = 1/2 that is the most uncertain case. 



A. Message passing solution 

For a fixed xi which is an independent set of the subgraph induced by Vi , we write 

^2[xi;t2] = 5] n ^xix^oe'^^^^*^"^ (36) 

X2 {ij)eE 

The BP equations for this problem are 

il>i^j{xi = 0) ocl, (37) 
iJi^j{xi = l)(xe^'*^ Y[ iJk^i{xk = 0), (38) 

which can be understood as messages from a variable to a constraint. The equations converge on an ER random 
graph for any j]^ < 132, which depends on the average connectivity. For smaller temperatures the replica symmetry 
assumption is not anymore correct. Then we obtain Max Sum equations, which for binary variables simplify slightly 
as messages can be parametrized with a single real number m(l) — m(0): 

nii^j = rrii = {2xi — 1) x 00 i G Vi, (39) 
mi^j=ti- ^ max{0, nik^i) = rhi^j i £ V2. (40) 

These equations converge on an ER random graph as long as the average degree is smaller than cxp(l). In the following 
we will always use these equations to find, for example, the greedy and offline solutions. To improve the convergence 
the algorithm for large degrees one can introduce very small noises in the weights Wi and use the reinforced equations. 
The distributions of the MS messages over stochastic parameters t2 are given by 

Pi^jim^j) oc '^PiiU) ^ Pk^i{mk^i)6{mi^j - rrn^j) = Pi^j. (41) 

ti {mk-n\kedi\j} kedi\j 

The equations for Pi^j{+1) are simply written as 

Pi^j{+l)=Pi II (1 - Pfc^i(+1)) , (42) 

with no need of the other probabilities. The normalization condition gives the probability of having zero and negative 
messages. The above survey can be used to compute the average energy fi(xi) for a given configuration of the first 
stage variables: 

£i(xi)= ^tia;i+^(Aei)- ^ (Aei,). (43) 

ieVi ieV2 (ij)eE,(i,j)eV2 



The average energy shifts are 



with 



(Aei) = J2 ^iPii^i) = Pii^i = +!)> (44) 

mi>0 

(Ae.,-) = (Ae,) - (Ae.^,), (45) 



{Aei^j) = ^ mi^jPi^j{mi^j) = Pi^j{mi^j = +1). (46) 

rrii^j >0 
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Let us compare the above average energy with the one obtained by samphng the stochastic parameters. More 
precisely, we generate a large number S of samples t| from distribution IliGVi ^^'^ minimum energy 

configuration using the Max Sum algorithm. Thus, the average energy can be written as 

^l'^(^i) = |E^i(^i'^2;t^)- (47) 

s 

In figure [T] we compare the two average energies. 

At finite temperatures the top layer BP equations read 

«',^,(xO(X J2 n "^k^^M n '^k'^^{x^) l€Vi, (48) 

{xi,\kedi\j,Vi}:XiXk=0 k£di\j,Vi k'edi\j,V2 

^,^j{Xj)^ J2 e^'^^"'^ n n '^k^^iPk^^) ieV2,jeVi, (49) 

{xk\kedi\3,Vi}, kedi\j,Vi k£di\j,V2 

{Pk^.\kedi\j,V2} 

^^^,iP^^J)^ e'3^<^^'-^-> H ^k^^iXk) n '^k^^iPk^^)S{P^^J-P^^J) * G 1^2 , ^ G V^2 (50) 

{xk\kedi\j,Vi}. kedi\j,Vi kedi\j 

{Pk^Akedi\j} 

Notice that for variables in the first set the sums are restricted by the hard constraints. Moreover, since the energy 
shifts depend only on Pi_j.j(+1), we need just to consider this probability in the equations. Therefore, the relevant 
variable in the messages is Pi^j{+1). 

The top layer MS equations are obtained as before by taking the /3i — > oo limit in the finite temperature equations: 

M,^j(a;,)= max I Ux^ + V Mfc_^,(xfe) + V Mk'-,^{x^)} ieVi, (51) 



M,^j{xj)= max < (Ae,) + V Mk-,^{xk) + V Mk^,{Pk^i)) teV2,jeVi, (52) 

{xk\kSidi\j,Vi} , '—^ 

Mi^jiPi^^)^ max \{Ae,^j)+ V Mk^,{xk) + V Affe^»(Pfe^O > ieV2j£V2. 

{xk\k£di\j^Vi} , I — — I 

{Pk^,\kedi\].V2}:P,-,,=p,-,, ^ feeaAi.Vi kedr\j..V2 ) 

(53) 

One strategy to solve the above equations is to work with discrete variables taking a small number of values. Let 
us assume Pi^j{+1) takes B + 1 discrete values in [0, 1], that is Pi^j{+1) = n/B for n e {0, 1, . . . ,B}. Given this 
binning, one could try to solve the equations by summing exhaustively over all the possible configurations of the 
variables. The time complexity of this computation grows exponentially with the degree of nodes as B^^. We can do 
better than exhaustive sum by using the distributive nature of the equations. When i is in the first set, the input 
variables {xk\k £Vi} are decoupled for different neighbors k and we need only to take care of the hard constraints. 
When i is in the second set, we have to sum over all the values of the input variables giving rise to the specific output 
variable Pi^j{mi^j = +1), which depends only on the product of (1 — Pk^i{rak^i = +1)) for different k € di\j and 
Pi{ti = 1). This can be done by splitting the whole sum into smaller ones such that at each step we get a convolution 
of the new messages and the sum over previous messages, that is 

FU.iP) - max I {F!-]i^^),Mk,^.iPi) I , (54) 




where I goes from 1 to — 1 = — 1. In the last step we update the message as 

Pj^ 
P 



M,^,(P,^,) = (Ae,^,-) + Ffx/{P - ^). (55) 



Now we need B^di operations to update a cavity message. The time complexity of this algorithm grows linearly with 
N for finite degree graphs and finite number of bins. In Figure [T] we display the average weight of independent sets 
for a fixed configuration of the first stage variables computed by the above equations with discrete variables. This is 
just to be sure that by summing over the discrete surveys in these equations we recover the correct average energy. 
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FIG. 1. The 2-stage problem: the average weight of independent sets, given xi, obtained from the surveys and by samphng. 
The variables in the first set are chosen randomly with probability 1/2. The stochastic parameters are in the most uncertain 
state, i.e. Pi{ti = 1) = 1/2. Also the first set nodes contribute to the total weight with probability 1/2. The data are the result 



of averaging over 100 instances of random graphs, weights and stochastic parameters. The size of the graph is A'^ 
the average degree and B denotes the number of bins. The number of bins increases from top to bottom. 



10* 



However, the above equations are indeed to find the optimal configuration of the first stage variables as described 
before. Figure [2] compares the performance of the algorithm with the greedy and offline solutions. The figure also 
shows how the maximum weight solutions obtained in this way depend on the number of bins. 

B. Monte Carlo approach: sampling + local search 

The two stage stochastic problem can in principle be studied by a Monte Carlo algorithm. Given a problem instance, 
we extract S samples of t2 from the probability distribution Iligya^'^^*)' ^'^^ ^ fixed xi and a sample t|, one finds 
Xj that minimize the total energy 



— argmin^r(xi, X2; tj). 

X2 



(56) 



Then we find x^ to minimize the average total energy 



arg min- > £(xi,X2;t2). 

^1 s=l 



(57) 



So far, the only difference with the previous sections is in replacing the average energy with an average over a finite 
number of samples. Then we have to choose an algorithm to solve the above two optimization problems. Here we 
use a mixture of Max Sum and zero temperature Monte Carlo; Max Sum to find x| and Monte Carlo to find x^. 
Given {t||s = 1, ... ,5} we start from xi = 0. Then we select randomly a node i from Vi and fiip Xi to x"*^™. This 
results to a change in the average energy A£ = ^ X]f=i Notice that to compute we have to find Xj and we 

do this by using the Max Sum algorithm. In a zero temperature Monte Carlo we accept the change to xf^^ only if 
AS < 0. We repeat the above steps until the algorithm finds a local minimum of the average energy function. In an 
iteration of the algorithm all the first set variables are selected in a random sequential way. In figure |3] we compare 
the outcome of this algorithm with the greedy and offline solutions. The algorithm is computationally expensive, and 
therefore in order to obtain a good statistic we had to restrict ourself to a small graph. The time complexity of these 
algorithms increases as SN'^ for finite degree graphs and when a finite number of iterations are enough to reach a 
good approximate solution. Here we assumed that the number of first stage nodes scales with N. Notice that, instead 
of zero temperature Monte Carlo we could use a more sophisticated algorithm like simulated annealing, but it would 
be more time consuming. 
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FIG. 2. The 2-stage problem: comparing the weight of independent sets obtained by the messages passing algorithm with 
the greedy and offline algorithms. The variables in the first set are chosen randomly with probability 1/2. The stochastic 
parameters are in the most uncertain state, i.e. Pi{ti = 1) = 1/2. Also the first set nodes contribute to the total weight with 
probability 1/2. The data are result of averaging over 100 instances of random graphs, weights and stochastic parameters. The 
size of graph is A'' = 10^ and c is the average degree. The number of bins B increases from bottom to top. 

C. Multi-stage stochastic optimization 

In this section we see how the message passing method can be generalized to study a X-stage problem. With 
obvious notations, the problem at stage I is to find the partial configuration Xj* which minimizes the expected value of 
the final cost function f (x,t), given the previously assigned variables Xi, . . . ,xj_i and the previously set parameters 
ti, . . . , t;: 

x^ = arg minEtj^j min • • • Et^- min5(x, t). (58) 

A greedy algorithm solves the problem at stage I by minimizing the total energy function replacing the unknown 
stochastic variables with their expectations. The offline solution is computed by minimizing the whole energy given 
ti, . . . , tK- 

Starting form the bottom layer BP equations at temperature Tk we could compute ^'f^j depending on xi, . . . , 
and ti , . . . , tx- These messages contain all we need to know about the variables X/^ . We denote the corresponding Max 
Sum messages by M^j. The probability of these messages over the stochastic parameters is given by P^^{M^). 
We could use these surveys to write the next layer MS equations M^MP.^^) which give the information necessary for 
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FIG. 3. The 2-stage problem: comparing the weight of independent sets obtained by the local search algorithm with the other 
algorithms. The variables in the first set are chosen randomly with probability 1/2. The stochastic parameters are in the most 
uncertain state, i.e. pi{ti = 1) = 1/2. Also the first set nodes contribute to the total weight with probability 1/2. The data 
are result of averaging over 100 instances of random graphs, weights and stochastic parameters. Size of graph is N = 500, c is 
the average degree and number of samples is 10. 



fixing variables yiR-i- Similarly we get probabilities over the stochastic parameters tK-i in the surveys P^^^ i^i^j ) 

.jK-2,pK- 



and these give rise to the new set of MS equations (Pi^j^)- summary, at stage I we compute the MS messages 



as 

mU^(x,)^ max J-(Ae(+i)+ ^ M^^,(P^+i) I z e l^.>;,j G (60) 

mU^{P\^\)= max |-(Ae4+i,.)+ ^ ^.3 ^yv>u (61) 

where for variables in stage / and before that, the messages M\^^(P\'^j) are concentrated on Xi. 
The messages statistics are given by 

PU,{MU^) <^Y.V^{i^) E n PU^{Mi^;)mU^-MU^) Z € , (62) 

PU,{MUi)<^ J2 n PUiMi^MMLj ~ MUj) ^eVv>i. (63) 

For variables fixed in the previous stages, Pl^j{M^^j) is concentrated on Xi. As before we used O to denote the 
corresponding equation for quantity O. And finally, the average cavity energies are computed by 

(Ae,i)= Y Ae\Y[PUiMU), (64) 

(AeL,)= E ^^'^1 n PL^{MU)■ (65) 
{Ml^^\k€ai\j} ked2\j 
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where 

Ae^= max -(Aer) + ^ AfU(^^X'.) ^ (66) 

AeL, = , i^ax <j -{Ae^)) + ^ • (67) 

^P^t:.\'^eO^\,} [ ^^^^ J 

Notice the nested nature of the messages, which makes an exact treatment of the above equations nearly impossible 
for large K. However, the efficiency of the algorithm for = 2 allows to use it to obtain an approximate solution to 
the iiT-stage problem with K > 2. A simple heuristics consists in finding 

= argminEt,+i...tj^, min £(x,t), (68) 

by repeatedly applying the algorithm for a two stage problem. Changing the order of minimization and expectation at 
each stage, we produce lower bounds for the expected value of the energy. As K increases, the approximation effects 
are accumulated, resulting to a suboptimal solution. However, as figure |4] shows, we get still better results than the 
greedy algorithm. 



V. THE TWO-STAGE STOCHASTIC MATCHING PROBLEM 



As a second illustration of the method described in Section let us consider the following problem, which is 
a variant of the stochastic two-stage bipartite matching problem introduced in [51 1251 [26j , where it is shown to be 
NP-complete, and for which the main results have already been published in (TB]. We are given a bipartite graph 
G = {L, R; E) with L further partitioned in Li and L2, and for each I2 € L2 a real number pi^ g]0, 1[. The objective 
is to find a maximum-size matching under the following two-stage setup: the vertices in Li are deterministic, and 
they must be matched in the first stage of the problem; the vertices in L2 are stochastic, i.e. they may or may not 
be available for matching, and the available ones must be matched in the second stage. After the first-stage vertices 
have been matched, the available vertices for the second stage are extracted (independently for each vertex I2 € L2 
with probability pi^), and then the second-stage optimization is performed. Therefore the optimization in the first 
stage must be done knowing only partial information (i.e. the probabilities p = {pi^, I2 G ^2}) about the availability 
of the second-stage vertices, and once the available vertices in L2 are known in the second stage, the matching of the 
first-stage vertices cannot be modified. 

We introduce two sets of binary variables, xi = {xi^r G {0, 1}, (^i^) £ E : li e Li} and X2 — {xi^r G {0, 1}, {l2r) G 
E : I2 £ L2}, to represent the possible AI C E, with xir = I if and only if (Ir) G M, and a set of binary stochastic 
parameters t — {ti^ € {0, l},^2 G -^2} with ti^ = I if and only if I2 is available for matching in the second stage, 
so that P[t;2 — 1] — Ph (notice a slight change in the notation compared to the previous section, which makes it 
more suitable for this specific problem). We define an energy function f (xi, t, X2) counting the number of unmatched 
vertices among the available ones. The first-stage problem consists in finding 



subject to the matching constraints 



x^ = arg min Et min £ (xi , t , X2 ) 



' ^ x,^ < I (Vr e R) 

ledr 
redh 

Xl^r < tl2 (V^2 e L2) 



(69) 

(70a) 
(70b) 
(70c) 



where dr = {I £ L : (Ir) £ E} and similarly for dh and dh- Once Xi and t are determined, it is straightforward 
to solve the second-stage problem. The difficulty of the problem stems from the fact that Et minx2 i^^(xi,X2,t) has 
a highly non trivial dependence on xi. In order to overcome this difficulty, we shall use the cavity method to first 
compute the minimum energy relative to X2 for fixed Xi and t, and then to compute the average over t of this quantity. 

In order to simplify the notation, in the following we shall always assume (unless explicitly specified differently) 
that I denotes a vertex in L, li a vertex in Li, I2 & vertex in L2 and r a vertex in r, possibly restricted to the neighbors 
of some given node, and that {Ir) denotes an edge of the graph, {hr) an edge with li £ Li and so on. 
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FIG. 4. The 10-stage problem: comparing the weight of independent sets obtained by the messages passing algorithm with the 
greedy and offline algorithms. The variables in each set are chosen randomly with probability 1/10. The stochastic parameters 
are in the most uncertain state, i.e. pi{ti = 1) = 1/2. Also the first set nodes contribute to the total weight with probability 
1/2. The data are result of averaging over 100 instances of random graphs, weights and stochastic parameters. Size of graph 
is A'^ = 10* and c is the average degree. Number of bins B increases from down to top. 



A. Message passing solution of the second-stage problem 



Once Xi is determined and the stochastic parameters t are set, it is straightforward to find the optimal X2. We shall 
now show how to do this is using MS, as discussed in [2^. For each edge (hr) G E we introduce the MS messages 
mi^^r a-nd mj-^i^- Notice that since the variables in the problem are defined on the edges of the original graph G, 
while the clauses are defined on its vertices, there is no distinction between "clause to variable" and "variable to 
clause" messages. The MS equations are then 



-maxf— 1, max mp if xi. = for each li £ dr 

oo otherwise 

max[— 1, max rrir'^i^] if = 1 

r'€dl2\r 



-OO 



otherwise 



(71) 
(72) 
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where the condition in ( 71 1 derives from the matching constraint ( 70a ) and the condition in ( 72 1 derives from the 

1, 



matching constraint ( 70c 
— oo with —1 



Notice that the presence of max[— 1, • • • ] in these equations makes it possible to replace the 
This in turns allows to make a very useful simplification of the notation: we shall define the messages 
m also on the edges connected to the vertices in Li, with the convention that m;^^.^ — rrir^i-^ = 1 if xi-^r = 1 and 
mi-^^r = iTir^h — ~1 if ^hr = 0- ^ is casy to see that the previous equations then become 



— maxf— 1, max mii^r] 

l'edr\l2 

maxf— 1, max rrir'^iA ii ti. = 1 

J r'edhV 



-1 



otherwise. 



(73) 
(74) 



These equations can be solved by iteration, and knowing the value of the messages at the fixed point allows to 
compute 



£*(xi,t) = min£(xi,t,X2) 



max[— 1, maxTO^-i.;] — max[— 1, maxm/^j.^] + max[0, rrii^^ + m^^i] 



ledr 



(75) 



{Ir 



This expression is obtained by taking the zero-temperature limit of the first line of (21) in [23 (notice that the 
simplified expression in the second line cannot be used in the zero-temperature limit, because it depends on an 
unresolved indetermination; this can be verified easily on a star-shaped or on a linear chain graph). The dependence 
of this expression on t is not explicit, and it derives from the matching constraints (70c I through the update equations 



(741. 



For the sake of our computation, it is important to analyse the nature and the number of the fixed points of the 
MS equations ([73) [74]). It is easily seen that these equations are closed for messages with support in {—1,1}, and 



also for messages with support in {—1, 0, 1}, so we can expect the fixed points to have support on either one of these 
sets. Fixed points with other support can exist for finite-size instances with appropriate initial values of the messages, 
but we have verified numerically that they disappear in the infinite-size limit, and we shall ignore them. A detailed 
analysis of the fixed points obtained in the infinite size limit is carried out in [57] for the case of random graphs with 
average connectivity c, where it is shown that the fixed points with support in {—1,1} (which we shall refer to as 
"two- valued" fixed points) correspond to replica-symmetric states and are correct for c < e, while the fixed points with 
support in {—1, 0, 1} (which we shall refer to as "three- valued" fixed points) correspond to replica-symmetry-breaking 
states and are correct for c > e. In the remainder of this Paragraph we shall extend that analysis to the case of 
bipartite random graphs, which is of interest for us. 

Let us consider a uniform ensemble of instances with poissonian degree distribution and average degree c (we shall 
consider balanced bipartite graphs for simplicity, so that the connectivity of left-hand nodes and that of right-hand 
nodes coincide), in the infinite-size limit. We shall denote by P^^j^ the average fraction of messages nrii^j. that are 
equal to +1 and by -P£L^^ the average fraction of messages TO/_j.r that are equal to —1, and similarly define P^_^]^ 
and Pa , r . From their definitions and from the MS equations (73 74 ) one obtains that these quantities must satisfy 



the following equations: 



E 



fc! 



1- (1-P, 



R^Ll 



1 — cP+ 

1 - e '"■^H^i 



R^L 



P; 



R-^L 



1 - e-^Pi^^ 



(76) 

(77) 

(78) 
(79) 



which implies that each of the quantities P^_j.^, (1 — P^_^jj), PfjL^i and (1 — Pfj_^^) must satisfy the equation 

X — exp [— cexp(— ex)] . (80) 
The crucial difference between the case we consider and the non-bipartite case considered in |27' is that here P^^ji 



can be different from P^_^i^ if (801 admits more than one solution. On the other hand, it is always possible to find 



a solution with P, 



always present. 



L^R 



1 — P]^_^j^ (and then P^^j^ = 1 — Pr-^l)^ expect that the two- valued fixed point is 
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In fact, for c < e (80l admits a unique solution, and the distribution of the cavity messages will be unique and 
satisfy P^_^ji = 1 — Pl^r ^-nd Pr^j^ = 1 — Pr^l- This will correspond to an essentially unique fixed point of the 
MS equations: it is possible that some (small) isolated components admit several fixed points, multiplying the total 
number of fixed points, but the extensive component (which dominates the energy) will have a unique fixed point, 
and this will be a two-valued fixed point (i.e. with support in {—1, 1}). This statement is confirmed by numerical 
simulations. 

For c > e the situation is more complicated: (80) admits 3 solutions, and Pt.r, and Pr>.T can be different from 



1 — Pl^r 1 — Pr^l respectively. From ([78 79 ) we see that P^^j^ and Pr^j^ are determined from P^^j^ and 



Pl^r^ so the total number of solutions will depend on the number of solutions for P^^j^ and for 1 — Pj^^j^ only. 
Taking into account the constraint P^^n < 1 — P£^r, we see that the total number of solutions for the distribution 
of the cavity messages is at most 6. Some of these solutions however might correspond to negative values of the 
energy, and must therefore be rejected. As an example we have studied in detail the case c = 5, where the number of 
solutions with positive energy is 3, and they all have exactly the same energy. One of the 3 solutions is three-valued, 
and the remaining 2 are two- valued. On finite size instances, we have verified numerically that these 3 fixed points can 
always be obtained by chosing appropriate initial conditions. Their energies are close to each other, but not exactly 
the same, and the correct one is always the largest. 



We conclude from this discussion that the energy computed from ( 75 1 is correct for instances extracted with 



poissonian degree distributions with c < e and approximately correct for instances with c > e. It must be noted, 
however, that the reduced instance to be solved in the second stage is not necessarily poissonian, as the probability 
that a node in R is matched to a node in Li can be correlated to its degree. Moreover, it is possible that some small 
disconnected components have multiple solutions, that combined with the 3 solutions of the giant component give 
a larger number of fixed points, but these will always have energies that are approximately equal. We shall neglect 
these possible issues, comforted by our numerical results. 

In the following, we shall treat the two- and three- valued cases separately: we shall see that they give rise to different 
algorithms for the optimization over xi. Based on the above discussion, we expect that the two- valued algorithm 
will find the correct solution for poissonian instances with small connectivity, while the three-valued algorithm will 
do it for instances with large connectivity. We can also expect the two-valued algorithm to provide an approximate 
solution for large connectivity, and in fact we shall see that the average energy it obtains on the random ensemble of 
instances we have analysed numerically is almost exactly the same as that obtained by the three-valued algorithm. 



B. Message passing solution of the first-stage problem in the two-valued case 

1. Computing the average over t 



We shall now compute the average over t of the expression ( 75 ) 



f*(xi) =Etmin£(xi,t,X2) (81) 



X2 



for the two-valued case where the MS messages mi^^r and nir^i^ take values in {—1, 1}. For this purpose, we shall 
treat the quantities mi^^r, i^r^h and ti^ as random variables with a joint probability distribution 

Q(m,t) CX ]^Pi2(*i2) ^11-'^ ["l/2^r = mi^^r] 1 [™r^/2 = ^r^i^] (82) 
h ihr) 



where rhi^^r is a shorthand for f' G dl2 \ f}Tti2) defined as the right-hand side of (74) and similarly 

rhr^i2 is a shorthand for m^-^i^ {{"niv ^2 ^ 9r\l2}) defined as the right-hand side of ( 73 ). We then need to compute 



the average of (75 1 relative to this distribution. 

Within the cavity approximation, we follow the approach of Section |III| and introduce the cavity marginals 
Pi^^rifni^^r) and Pr^i2 ("Tir^h)- Sincc mi2^r and mr^i2 have support in {—1, 1}, we can parametrize these marginals 
with a single real number, Pi2^r = ^ l2-^r[1^l2^r = 1] G [0, 1] (and similarly for P^-j./^). The update equations for 
these cavity marginals can be obtained with the general method of the previous section (i.e. using BP for the dis- 
tribution Q(m,t)), but in this case it is possible to derive them in a more intuitive way as follows. Again, we can 
simplify the notation by extending the definition of Pi^r and Pr^i also to the edges connected to vertices in Li, with 
the convention that Pi-^^r = Pr-t-h = 1 if xi-^ = 1, and Pi^^r = Pr-fh = if — 0. 



We see from (73) that TOr-i.i2 is +1 if and only if all the incoming mii^r are —1 (for each /' e 9r \ I2), so that 



Pr^l2= n (1-^''^-)- (83) 
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Similarly, from (74) we see that mi^^r is +1 if and only if t/^ is 1 (which happens with probability PI2), and all the 
incoming rrir'^i^ are —1 (for each r' e dl2 \r), so that 



Ph n (1-^-' 

r'£dl2\r 



.h)- (84) 
Notice that by doing this (i.e. conditioning the probabilities to the values of ti^) we are giving the correct weight to 



all the fixed points even in the case where their number varies with t, as explained in Section III 



The coupled equations (83 84) can again be solved by iteration, and £*{xi) can be computed from the fixed point 
values of Pi^r and Pr^i- The contribution of a vertex I g L will be different from zero only if the node is available 
for matching, which happens with probability pi (setting = 1 if ^ e Li), and in this case it will be equal to 
— niax[— 1. max,.g9i rrir^i] which is +1 with probability rirea;(l ^ -Pr-s-/), and —1 otherwise, so that 



max[— 1, maxmr->.;l 

redl 



Pi 



Similarly, the contribution of a vertex r ^ R is, 

Et 



max[— 1, maxm;_^r] 



2 n (1 - ^'■^') - 1 

L r£dl 



2 n (1 - p^^r) - 1 

. ledr 



(85) 



(86) 



Finally, the contribution from an edge {Ir) G i? in ( 75 ) is max[0, mi^r+mr^i] which is +2 with probability Pi^r x Pr^i 
and zero otherwise, so that 



Et [max[0, mi^r + ?Tlr^i]] = '^Pi^rP, 



r^l ■ 



(87) 



We obtain 



redl 



E 



2 n (1 - Pi^r) - 1 



ledr 



2 ^ Pl^rPr 
(ir) 



1 — ■ 



Let us stress again that in this expression, even though Li and L2 vertices are treated in a completely symmetric 
way, the values of Pi^^r and Pr^i^ with li € Li are explicitly determined by xi, while the values of Pi^^r and Pr^i2 
with I2 S L2 are determined by Xi implicitly through the update equations (83 84). 



2. Solving for xi 



We are now in position to solve the first stage problem, namely finding xj defined as 

— argminEt min£(xi, t, X2) = argminf*(xi) 



(89) 



where £*{'x.i) is defined in (88), and where we remind that Pi-^^r — Pr^h = xi-^^r^ with xj^^ subject to the matching 
constraints (70a 70b I, while Pi^^r and Pr^i^ must satisfy the update equations (83 84), and we keep following the 



convention that li always denotes a vertex in Li and I2 a vertex in L2, and similarly for l'^^ I2 etc. 

Since f *(xi) is a sum of local terms subject to local constraints, we can solve this minimization problem with MS. 
We introduce the messages in terms of the cavity marginals Pi^r[Pi^r = P] and Pr^i[Pr^i = P]- For the edges 
connected to vertices in Li we define 



log 



P/ 



^[Ph- 



Fl,^r[Ph^r = 0] 



1] , Pi 

- log 



"fr^l, = log 



lAPr^h=^] 
lAPr^h=0] 



= log 



Fr^lA^hr = 1] 



(90) 
(91) 



which are simply real numbers. On the other hand, for the edges connected to vertices in L2 the probabilities P;^ 
and Pr^i2 can take any real value in [0, 1] and the messages, defined as 



^l,^r{P) = l0gPi,^,[P,,^, - P] + Cl, 
*,^,,(P) = l0gP,^,JP,^/, =P]+Cr 



(92) 
(93) 
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are functions of a real variable. The additive constants Ci^^r and Cr^i^ are set by requiring that maxp '^i^^r{P) = 
maxp ^'r-j.jj (P) = 0. For numerical purposes these functions can be approximated by a vector of real negative 
numbers corresponding to finite size bins for the values of P in [0, 1]. 

In order to obtain the update equations for '^i^r (and we must consider all the terms in (88) where the 



corresponding variable Pi^r (and Pr^i) appears, which will always be two: a vertex term and an edge term. Notice 
that the fact that we include the edge term in both the updates of '^i^r and "^r^i implies that we are defining these 
messages as the "variable to clause" ones. It will be important to remember this when deciding the value of xj^^ from 
the values of the fixed point messages '^i^^r and , since we shall have to subtract from '^i^^r + ^r^ii the edge 

contribution to the energy to avoid double-counting it. 

The update equations are obtained in a straightforward manner as follows. Let us begin with the equation for 
^jj^-j-r- The relevant energy contribution is 



8l,^r{Ph^r. {Pr'^l, , r' G dh \ r}) = 



2 n il-Pr'^w)-l 
. r'edh 



+ 2Pi^^rPr^h 



(94) 



where we recall that Pi^^r' ~ Pr'^h — xi^r' G {0, 1} for each r' E dli, and that these are subject to the matching 
constraint (70b). We obtain 



\og¥i^^r[Ph^r = P] = niax 

{Pr'^l-^(^{OA}:r'edli\r} s.t: 



p./^,, <1 



-£i,^riPAPr'^h,r' edh\r})+ ^ logP,,^i[n- 

r'£dli\r 



(95) 



When Pi^^r is 1, the matching constraint (70b) forces all the incoming Pr'^i^ to be (for r' € dh \ r), and the 
previous equation reduces to 



logP,,^,[Pj,^, = 1] = -1 + ^ logP,.^,JP,-^,, = 0] 



(96) 



When Pij^^r is 0, because of the matching constraint (70b) the incoming Pr'^i^ (with r' e dh \ r) can cither be all 
(as before), or one of them can be equal to 1, all the other ones being 0. We find 



logP,,^,[P/,^, = 0] = max i - 1 + logPr'^/i [Pr'^h = 0] 

I r'edh\r 



max 

r'edli\r 



1 + log Pr'^l, [Pr'^h = 1] + J2 log ^r"^h [Pr"^h = 0] 

r"^dli\{r.r'} 



By subtracting (97) from (96) we obtain 

^'ii^.r = — max 



0, 2 + max '^r'^h 



(97) 



(98) 



In a similar way we obtain the update equations for 4'/2-i.r(P/2->r)- The main difference is that now both Pi^^r 
and the variables associated to the incoming messages '^r'^hiPr'^h) will be real variables in the interval [0, 1]. The 
maximisation is therefore no longer performed over a discrete set of partial configurations for the incoming variables, 
but on a continuous range. Moreover, these variables have to satisfy the update equations (83 84 1, which are a 



constraint in the maximisation. The relevant energy contribution is again formed by a vertex and an edge term and 
is given by 



Sl^^riPh^r, {Pr'^h, ^' ^ 9^ \ r}) = pi 



2 n (l-^r'^/2)-l +2Pi,^rPr- 

[2(1 - Pr^i,)Pi,^r~Pl,] + 2Pi,^rPr^i, 
2Pi^^r -Ph 



(99) 

(100) 
(101) 



where we have made use of the update equation (84). Notice that, as expected from the discussion in Section III this 



energy term does not depend on Pr^i^ , thanks to the simplification between part of the vertex contribution and the 



18 



edge contribution. This is crucial to ensure that the messages ^i^^riP) and ^r^hiP) are uncorrelated, as required 
in order to apply the cavity method. In fact, in this case the result is extraordinarily simple, as it only depends on 
the outgoing variable Pi^^r and is independent of the incoming variables {Pr'^i^, r' g dl2 \ r}. The update equation 
is then obtained in a straightforward manner: 



= max 

{Pr'^l2^[^^'^]'r'&dl2\r} s.t: 



^£l,^r{P)+ "^r'^lJPr'^l,) 
r'£dl2\r 



-{2P ~ PI2) + max 

{P^,^,^<£[0A],r'<£dl2\r} s.t: 
P=Pi2n.'eai2\Ai-Pr'^,2) 



^ ^r'^l,iPr>^l2) 
r'€dl2\r 



(102) 



(103) 



It is important to realize that this constrained maximisation problem can be solved efficiently by exploiting the 
associativity of the maximum as follows. In order to see how, let us introduce the following simplified notation: 



gk{y) = niax 

{zi.,...,Zk}€[0,ir s.t.: 

2/=nj=i,...,fc(i-zfc) 



j=l,...,k 



(104) 



with y G [0, 1], which is of the form we need. Then 



9k{y) = max 

Zfc6[0,l-y] 



max 

Zfc6[0,l-y] 



/fe(zfe) + max 

{zu--.,Zk-l}&[0,l]''-^ s.t.: 
y/(l-2fc) = nj = i k-iC^-^j) 



]=l,...,k-l 



fk{zk) +9k- 



y 



(105) 
(106) 



so we can compute gk{y) iteratively, starting with go{y) — 0, in a time which is linear in the number of variables 
appearing in the maximum (i.e. linear in the connectivity of l2)- In fact, it is possible to compute all the messages 
^^ir^^riP) for r G dl2 in a time which is linear in the connectivity by computing (and keeping in memory) all the 
quantities (resuming to the full notation) 



9r{y) 



{P.' 



hr{y) = 



{Pr' 



max 

^i^e[0,l].,r' edl2- r' <r} s.t.: 

-■Ur'em2:r'<r(^~Pr'^l2) 



max 

^l^i^[0.1];r'edl2. r'>r} s.t.: 



E ^r'^l2{Pr'^l2) 



.r' ^dl2'- r' <r 



E ^r'^hiPr'^h) 



.r'^dh- r'>r 



which are computed iteratively as in (1061 and in terms of which the update equation (1031 becomes 



.(P) = -(2P-p;J 



max 
{yi,y2}6[o,il^ s.t.: 
P=pi2y iV2 



br(yi) + hr{y2)] . 



(107) 



(108) 



(109) 



By doing this, all the outgoing messages are computed by performing less than 3|9^2| one-dimensional maximisations, 
each of which has a time complexity proportial to the , the square of the number of bins that represent the cavity 
marginals. Overall the time complexity is proportional to per iteration. 

Let us now turn to the update equation for 5',._5.;j. In this case, the variable Pr^ii on the output edge is in {0, 1} 
and the corresponding message is a real number, while the variables and messages on the incoming edges {Ir) with 
I G dr\li are mixed: if / S Li, the variable Pi^r will be in {0,1} and the corresponding message will be a real 
number, and if Z € L2 the variable Pi^r will be in [0, 1] and the corresponding message will be a function with domain 
in [0, 1] and codomain in ] — c»,0]. The integer variables Pv^^r must satisfy the matching constraints (70a) 



that is 



must satisfy the update 



'i + Si' £dr\ii Pi'i-^r < 1- On the other hand, the constraint that the real variables Pi^ 
equations ( [8^ has no effect, since the values of Pr-s-Za foi' ^ach Z2 € 9r n L2 do not appear in the update. The relevant 
energy contribution is as usual formed by a vertex term and an edge term and is equal to 



Er^l,{Pr^h,{Pv^r, I' £ Or \ h}) 



I'edr 



Pl,^r) -l + 2Pi 



h- 



rP 



(110) 
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where we recall that Pi-^^r = Pr^h = xi-^r- The cavity marginals are then 



\0gFr^l,[Pr^l,=l]= max V logF i,^^r[Pl[^r = 0] + y2 ^l,^APl,^r)\ 

logP.^z, [Pr^i, = 0] = max l^dr\ h}) .B{{^i^r, ledr\ h})] 



(111) 
(112) 



with 



max 



1 - 2 n (1 - 

hedr 



l[edr\h hedr 



(113) 



and 

B{{-^i^r,ledr\h}) 



max 



1+ max i\ogFi'^r[Pl[^r = l]+ V logFi>>^r[Pl'^^r=0]] 

' \ l'^edr\{h.l[} / 



+ Yl 'i'h^riPh^r) 



hedr 



(114) 



Notice that all the maximisations in (111 112) are unconstrained and therefore performed easily. The outgoing 
message '^r->ii is then computed by subtracting (112) from (111). 



Finally, we turn to the update equation for '^j.^hiPr^h)- Exactly as above, the incoming variables and messages 
will be of mixed types. The only difference is that now the message to be computed is a function, and that the 
incoming variables Pf^r with V € dr\ I2 must satisfy the update equation (83) for Pr^i^- As before, the relevant 
energy contribution can be simplified by using this constraint to eliminate the dependence on Pi^^r, whose presence 
would undermine the application of the cavity method: 



£r^l,{Pr^l„{Pl,^r, I' G \ ^2}) = 



2Pr^uP, 



L I'edr 

= 2(1 — Pl2^r)Pr^l2 ^ 1 + '^Ph^rPr^h 



We then have: 



logP,^ijP,_ 



P] 



{Pi' 



max 

,G[o,i],i^e(ar\j2)nL2}, 



{Pi^^,.e{o,i},hedrnLi} s.t.-. 

p=nt>.or.\i^a'Pi'^^.) 



1-2P+ J2 l0gPi^r[Pi^r] 

ledr\l2 



(115) 

(116) 
(117) 



(118) 



which again can be computed efficiently thanks to (106). It is straightforward to substitute the messages ^r^i2 and 
instead of the cavity marginals. 

The coupled update equations for '^i^^r, ^r-s-d, ^la-s-r and '^r^h are solved by iteration. Notice that these 
are the only message passing equations that actually have to be implemented and solved numerically. The message 
passing equations for the second-stage MS messages mi^r and vrir^i , as well as those for the messages Pi^r and Pr^i 
introduced for the computation of the average over t, are only needed to derive the expression of the constraints to 
which the maximisations in the update equations for '^j.^i-^, ^i2->-r and '^^^12 are subject to. 

The optimal configuration Xi is determined from the fixed-point value of the messages ^'jj-i.r and "^r^ii as 



if ^l^^r + *r 

otherwise 



>;i + 2 > 



(119) 
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The constant addend +2 in the condition is due to the fact that the energy term 2Pi^^rPr^h = 2a;ijr associated to 
the edge {hr) is subtracted from the updates of both '^i^^r and "^r^h, so that it must be added back to the sum 
+ ^r^h ordcr to avoid double-counting it. 
In order to improve the convergence of the update equations we added both a noise and a reinforcement term to 
the messages propagating on the edges incident on vertices li € Li (i.e. to the messages that are needed to assign the 
binary variables xi-^r), with a technique similar to the one used in [28 1 129 j . The noise term is just a constant (small) 
random field rji-^r acting on each variable xj^^- Each rji-^^. is extracted uniformly in [0, ?7]. The reinforcement term is 
defined as follows. We introduce an esternal field Hi-^r acting on each variable xi-^r and a constant parameter p. Let 



us denote by a superscript the time t in the update sequence (i.e. the iteration number) . The update equation ( 98 ) 
becomes 

*Ur = ^h^riiK^Xi, , r' e dh \ r}) + tpHl;; + Til,, (120) 



where ^ij_j.r({^'*./J'j.;j , r' G dli \ r}) is the right-hand side of (98) computed at time t — 1. The update equation for 
is modified exaclty in the same way by adding the term tpHf~^^ + rji-^r- The external field is computed after 
each iteration t as the total field acting on the variable at the iteration t, 

= K^r + K^l, - tpKr' " ^hr + 2 (121) 

where the terms —tpHl~^ — rji^r + 2 are again included in the sum '^\^_^^ + in order to avoid double-counting 

them, and with = 0. In the presence of a reinforcement term, the values of the messages do not converge to a 
fixed point: on the contrary, typically some of them diverge. The convergence criterion used to stop the algorithm is 
then that the configuration of xi variables corresponding to the instantaneous value of the messages does not change 
for a number of iterations / (e.g. / = 100). 



C. Message passing solution of the first-stage problem in the three-valued case 

Let us now turn to the case in which the MS messages m/^-s-r and nir^i^ introduced for the optimization over X2 
take the three values {—1, 0, 1}. We shall see that this leads to different update equations for the second and third 
level messages {P and ^'(P) respectively), and therefore to a different algorithm. We shall proceed in close analogy 
to the two-valued case just discussed. 



1. Computing the average over t 



As before, we start by computing the average over t the expression (75 1, 

£*(xi) = Etmin£(xi,t,X2) . 



(122) 



The joint distribution of the messages rrii^^r and rrij—fi^ and of the stochastic parameters t has the same expression 
(82 1 as before, with the same update equations (74) and (73) defining the functions rhi^r and ifir-^i respectively. 

As before we introduce the cavity marginals Pi^^rifni^^r) and Pr^i^ijnr^i^) , but since now mi^^r and rrir^i^ 
take values in {—1,0, 1}, in order to parametrize them we need the three real numbers Pj^^^ — ^[^h^r = +1], 
P^^^r = ^[nT-h^r = 0] and Pi^^^ = F[rni^^r = —1] subject to the normalization condition Pil^j, + Pj^^^j, + Pj^^^ = 1 
(and similarly for P^_^i^, ^r^h ^'^^ -^r'-^h)' order to simplify the notation, we introduce similar quantities for the 
edges incident on the vertices in Li with the definitions P^^ = P~L, = 1 if x/.r = 1 and Pr^ = P _^i = 1 if 

Xl^r = 0. 

From the update equation (73) we see that mr^i is +1 if and only if all the incoming messages are —1 (for each 
I' Gdr\l), so that 

Pr\l^ n Pl'^r- (123) 
l'edr\l 

Moreover, nir^i is —1 if and only if at least one of the incoming mi'^r (with I' ^ dr\ I) is -1-1, so that 

Pr-.i = ^- n i^-Pr^r) ■ (124) 

l'edr\l 
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Similarly, we see from the update equation ( 74 1 that mi^r is +1 if and only ifti = 1 (which happens with probability 

(125) 



Pi), and all the incoming messages are —1 (for each r' £ dl\r), so that 



PI n Pr^^l 

r' ^dl\r 



while mi-^r is —1 if = (which happens with probability 1 — pi), or ii ti = 1 and at least one of the incoming 
messages nir'^i (with r' £ dl\ r) is +1, so that 



Pi^r = il-pi)+Pi 



1- n (1-^.'- 



r'^dl\r 



(126) 



Solving these coupled equations by iteration, we can compute £*{xi) in (122 1 as a function of the messages P; 



and P^^i- We start by noticing that when rrir^i G {—1, 0, 1} we have — max[— 1, max^g^; fTir->/] — — max^ga; rur^i, 
and that the probability that max^ga; Tn^-^i — —1 is pi Jlrea/ ^r->-i' while the probability that max,.gg; nrir^i = +1 is 

PA^-Y{r<,3li^-Pr^l)l^0 that 



Similarly, 



max[— 1, maxm,.-!.;] 



max[— 1, maxTO;_j.r] 



Pi 



.r^dl r<£dl 



n^r->.+n(i-^i.)-i' 



(127) 



(128) 



ledr 



ledr 



Finally, max[0, nir^i+mi^r] = 2 with probability Pit^r-^r^i^ ^'^'^ max[0, rrir^i +mi^r] = 1 with probability P;^^(l 

P^-.l - Pr-^l) + (1 - P^r - Pr-.r)Pr-^l that 

Et [max[0, mi^r + nir^i]] - ,(1 - P^^i) + (1 - P/I>JP,t., ■ 

We obtain: 



(129) 



i Vredl redl 

(;r) 



E 



n ^r->. + n(i - ^.^.) - 1 

ieOr /ear 



(130) 



2. Solving for xi 

In order to compute 

= argminEt min£(xi,t,X2) (131) 

XI X2 

we introduce the MS messages in terms of the cavity marginals 

= (132) 

*,,^,(P+, P-) - logPi,^, [P+_^,, = P+, P;;_^, = P-] + Ci.^r (133) 

and similarly for "^r^i^ and '^r^i^{P^ ,P^). The messages VP/^-i-r and ^'r-s-ii are real numbers, while the messages 
'^i2^r and ^'r-f/a are funtions with domain {(P+,P~) S [0,1]^ : P"*" + P^ < 1} and codomain ] — oo,0] (for an 
appropriate choice of the additive constants Ci^^r and Cr^i^). For numerical purposes we shall approximate each of 
these functions with an array of negative real numbers corresponding to finite size bins for the values of (P^, P~) in 
[0,1]2. 
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Let us now derive the update equation for ^i-^-yr- As in the two-valued case, we need to consider both the vertex 
and the edge energy contributions. Since -P;^_j.j. = P^-^i^ = 1 ~ ^h^r ~ ^ ~ ^r-^h ~ ^hri we only need to consider 
P^^j. for the variables on the outgoing edge and P^^i^ for the variables on the incoming ones. Also, we remind that 
pi-^ = 1 and we obtain: 

£i,^r{Pi+^rAP.t^i,,r' €dh\r}) 

.redh redh 

= 2(1-^^--.) n i^-PrUlJ-^ + '^Pi^^r 
r'€dli\r 

The MS equation is then 
logP,,^. [P+^^ - P+] 



[P,Uri^ - P--.1, ) + (1 - Pl-^r)PrUl^ (134) 

(135) 



|p+^j e{o,i}, r'ea/iXr} s.t.-. 



I -£,,^.(P+,{P+_,,^,/ e dh\r}) + logP..^,, [P+_,,J 

I r'edh\r 



For P+ = 1 all the incoming P^,^i^ must be and we obtain 

logPz,^. [P+^,. = 1] = -1 + logP.'^,, [P+^i^ = 0] , 

r'^dl\\r 

while for — the incoming P^^i_^ can either be ah 0, or one of them can be 1 and ah the other 0: 



logPz,_, [P+_^, = 0] = max I - 1 + ^ logP,,^,, [P+_^,^ = O] 



1 + max 

r'^dl\\r 



logP.,^,, [P+^i^ = 1] - logP.,^,, [P+^i^ = 0] + logP-'-'i = 0] 

r" £dl\\{r,r'} 



(136) 



(137) 



(138) 



= logP,,^, = + 



max 



so that 



0, max 

r'edhV 



0, max 4','-s.;, 
r'eaii\»- 



(139) 



(140) 



We now turn to the update equation for ^'j^-j.^lP^, P )• The variables on the outgoing edge, Pj^^^ ^^'^ P^^r^ 
are both in [0, 1] and they satisfy < P^^^ + Pi^^r — ^- same is true for the variables in the incoming edges. 



{{P^^i ,Pr'^i^)j £ \ J'}, which must also satisfy the constraints (125 1261 



(141) 
(142) 



P^'.^r^Pl. n Pr'^l. 
r'edhV 

p,;^. = i-w. n i^-PrUi.) 

r'€dl2\r 

for given P,+_^^ and Pi^^^- 

Again, the variables P^^^j.^ and Pj^^^ ^^PPear in both a vertex and an edge energy terms, and the energy contribution 
we need to consider is 

£l.-.r {{PiUr - Pr.^r): { [P^ ^l. : Pr' ^l. ),r'edh\ r}) 



.r'€dl2 r'Edh 
l-pi2+PlUr~Pl2^r 



P7-.lA^-Pi:^r) 



(143) 

[^-P:^-.l2)PlUr (144) 

(145) 
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where we made use of the constraints (141 142), and which only depends on the outgoing variables (notice that a 
priori it could also depend on P^_^i^ and P^_^i^, but does not). 
The MS equation is 



max 

-P+ , +P', , <1 (Vr'eSiaXr) 



eoi2\T 



-1+P,,-P+ + P-+ '^r'^l2{P^^l2^P;.^ 
r'£dl2\r 



(146) 



which can be computed efficiently thanks to a method similar to (106). We introduce, with an obvious simplification 
of notation. 



Gk{y+,y- 



max < N fi{ 

{(^+,^^)e[o,l]^i=l,...,fc} s.t.: [fr"^ 

^^+^r^i (vi=i,...,fc) 
y-=nti(i-2.+) 



(147) 



max 

(z+,z-)e[0,lf s.t.: 



max 



0<z+<l-y" 



/fe(4'^fc)+ , max 

{(^s ,Zi )e[0,l]^, i=l,...,fe-l} s.t.: 
z + +z:r<l (Vi=l,...,fc-1) 

y-/{i-zi)=n':zl(i-4) 
+ 



■fc-i 



/fe(z+,z,-) + Gfc_ 



y 



which is easily computed iteratively starting with 

Go{y+,y-) 



y_ 



if y+ = y 

— oo otherwise. 



(148) 



(149) 



(150) 



The update equation for ^'r-i-Zi is obtained in a similar way. We notice that the outgoing edge is connected to a 
deterministic vertex Zi, so we must have P^^^ — Pr->ii = 1 ~ Ph^r = 1 ^ Pr->h ^ 1} (and we can express all of 
them in terms of P;^_>.r), while the incoming edges are in part connected to deterministic vertices ![ G dr\li (and the 
corresponding variables satisfy the same relations as those on the outgoing edge) and in part connected to stochastic 
vertices I2 G dr (and the corresponding variables are in [0, 1]). 

The energy terms to be considered are again a vertex and an edge term. 



Sr^UPrt.1,, {PLi',^ l[ e dr\h}, {{PiU,,Pr2^,), h e dr}) 

= [ n Pl'^r + n (1 - Pl^-^r) - 1] + [nt,,,(l - P^-^r) + (1 " Pr^l,)Plt 



I'edr I'edr 

i^-pu) n Pi'^r+i^-pu) n (i-^p-..)-i+2P,t,,, 

l'edr\h l'edr\li 



(151) 
(152) 

(153) 



Notice that, again, this only depends on the "right" messages: P^_^i on the outgoing edge (but not Pj^^^), and 
P^^r '^^ incoming ones (but not P^_^i,)- The messages on the edges connected to Li vertices are subject to the 
matching constraint 



< 1 



(154) 



while the messages on the edges connected to L2 vertices (i.e. Pj^^,,) ^^'^ unconstrained, since the only constraints 
they are subject to are the BP update equations (125 1261 that define them in terms of the messages P^_^i^, and 
these do not appear in the expression of the energy. 
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The MS equation will then be: 



+ 



] }, (155) 



logP.^i, [P+_,i, = 0] = max [^({Pi,^, [P+^^, P^T^J , /' e cJr \ /i}) , 

i3({Pr^. [P^t^^Pp,^,] , I' e ar\;i})] (156) 

with 

I'^r 

{i- n Pi.^r- 11(1-^^-^.)+ 

I hedr hedr 



max 



P,7_ = 



J2 logP,,^.[P+^,,P,;^J , (157) 



and 



ij({p,^, [p+_^,p,7_,],rear\;i}) 



max 



1 + max 



l[edr\h hedr ) 



(158) 



Notice that all the maximisations are trivial, except the one appearing in A, which can nonetheless be computed 
efficiently exploiting its associativity with a method similar to those explained before: we introduce 



1=1 



G,{y\y-)= max | - fl - Rd + E .A( 

{(zj ,z.)e[0,l] ,l=l,...,fe} s.t.: .^^ 

y"=nti(i-^+) 
max 

(z+,z-)e[0,l]^ s.t.: 

y-<i-4<i 



(159) 



(160) 



and compute it iteratively starting with 



if y+ =y- = 1 
— oo otherwise . 



(161) 



Finally, let us derive the update equation for '^r^i2{P^^i^, Pj-^i.^)- Some of the incoming edges will bo connected 
to deterministic vertices li, with variables in {0, 1} satisfying the normalization constraint P(^_>r + ^h^r ~ ^ ^^'^ 
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the matching constraint X^iisSr ^h^r — ^- "^^^ remaining incoming edges will be connected to stochastic vertices /j, 
with continuous variables satisfying the normalization constraint Pit.^ + P,7.^ < 1 and the constraint 

Pr-^l.= n (162) 
ledr\l2 

Pr-.l2=^- n i^-P^-^r) (163) 
ledr\l2 



derived from the update equations (123 124) and involving the variables connected to all the incoming edges (both 
stochastic and deterministic). 

The energy term contains both a vertex and an edge contribution and is given by 

£r^i,{P+_^,^,p-_^,^,{P^,^r^h e 5r,{(P+_,,„F,7_,J, i;, e dr\h}) 

= n Pr^r + n (1 - Pl'-^r) 1 + Pl^^r (l ^^-^ J + (l " Pr2^r) P^-^l^ (1^4) 
= Pr-^l2 Pr-^l2 (165) 

which only depends on the variables on the outgoing edge. 
We can now write the MS equation as 

logP,^ijP+,p-]- max \p+ -P- +Y,\ogVi^^r[P^^^A + 



^ logP,i^.[P+^,,P,7^j|. (166) 



When P+ > 0, the constraint P+ = Iliear Pi^r forces all the Pj^^j. to be 0, and the equation simplifies as 
logP,^ijP+,p-] = max Jp+-P-+ ^ logPi,^,[P+_^,. -0] + 

{{P^^,.-P^^^)<^l0^^f-'''2^9r\l2}s.t.: [ i^f,Qr 



'■2 2 

ni^ear\i2 Pi'^^r=P • 



E l°gP'i-'-[<-^.'^;I-^j| (167) 

l'2€dr\l2 ) 



which is again of the form ( 160 1 



If instead P+ = 0, the constraint P+ = riiear-XZa Pi^r '■^^'^ ^e satisfied by setting to either one of the Pi^^^ (and 
only one, because of the matching constraint), or at least one of the Pj7^^- In the first case, the corresponding P;^_j.^ 
will be 1, and the constraint P^ = 1 — n/6ar\/2 (l ^ Pi^r) '^^^ satisfied only if P^ — 1. In the second case on the 
other hand P,7 , is a continuous variable and P~ can be smaller than 1. We then have 

logP,_ijO,P-]= max J -P-+ ^ logPz,^,-[P+_^, =0] + 



E ^^S^l2^r[P^,^r^P,vA. (168) 
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(a special case of the previous equation) and 

logP,^i,[0,l] =niax [A{{Pi^r[Pit,r^Pr^,], I e drXh}), 5({Pi^r.[P;^,,, P(I,J, I e dr\l2})] (169) 

with 



^({P,^,[P,+ „ PiZ^rh l^dr\ h}) = ^ ^ max \ - 1+ 

{(^it^,,p,7^Je[o,l]^^ie9r\^2} L 



and 



^ logP,,^.[P+^, = 0]+ ^OgFi,^AP,t^r^Pl^^r]\ 



(170) 



P({Pi^r[Pi+ „PiI,J, I e dr\l2}) = max - 1 



+ max 

hedr 



l0gP;,^.[P+^, = 1] - l0gP(,^.[P+^, = 0] + ^ l0gP;,^.[P+^, = 0] + 



hedr 



+ E logP^i-r[P;t^,,P,7^j|. (171) 

lo->-redr\l2 ) 



'2^redr\l2 

Notice that the maximisations in A and B arc unconstrained, and tlicrcforc immediate. 

Despite their appearence, these update equations are implemented straightforwardly. To improve the convergence 
properties of the algorithm, we added both a noise and a reinforcement term to the messages on the deterministic 
edges (as explained for the two- valued case). 



D. Numerical results 



In order to validate our approach, we performed three series of numerical tests. First, we compared the results of 
the two- and three-valued versions of the algorithm. As we shall see, both algorithms give solutions with energies that 
are very close to each other both for small and large connectivities, the main difference between the two algorithms 
beeing the running time. Second, we compared the performance of the two-valued algorithm with a greedy heuristic 
based on the average of p{t). We shall see that the two- valued algorithm finds solutions with significantly smaller 
energy when c > e. Third, we compared the performance of the two-valued algorithm with the standard method 
used to solve two-stage optimization problems: stochastic programming. We find that stochastic programming has 
an acceptable running time for c < e (but still takes ~100 times longer than the two- valued algorithm to find a 
solution with the same energy), while for c > e its time performance worsens dramatically, and it becomes practically 
impossible to solve instances with \Li\ = 1 000 vertices and c = 3.0. 

In all three cases we did extensive numerical simulations to compare the performance of the different algorithms, 
both in terms of the energy of the solution obtained and in terms of the running time (and, crucially, its scaling with 
the size of the system). 



1. Comparison between the two- and three-valued results 

As a first test, we did a series of comparisons between the results of the two- and three- valued versions of the 
algorithms on the same set of instances. As we mentioned previously, the three-valued version has a running time 
which is much longer than the two-valued version, so we did this comparison on relatively small-sized instances, with 
\Li\ = 300 and IL2I = |P| = 600. Since the probabilities pi^ are drawn imiformly in [0, 1], the typical "final" instance 
is roughly balanced, with \L\ ~ 600 = |P|. We used both reinforcement and noise as described previously, with 
parameters p = 0.001 and 77 = 0.001 and with a number of bins B = 10 to discretize the one- and two-dimensional 
distributions. The values of these parameters were chosen based on a separate series of comparative runs with several 
values of p, fj and B on the same ensemble of instances. The convergence criterion used in the presence of reinforcement 
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is that the values of the Xi variables do not change for / = 100 iterations. Each data point is computed as the average 
over ~ 350 instances. For each instance the energy is computed by extracting a sample oi S = 300 realizations of the 
stochastic parameters t, computing the optimal X2 for each realization, and averaging the corresponding energy over 
the sample. The total running time with these parameters for typical instances is of the order of 1 second for the 
two-valued algorithm and a few minutes for the three-valued one, and we obtained convergence for all the instances 
we tried. The average energies obtained in these runs are shown in Table |l] 



c 


Two- valued 


Three- valued 


Difference 


2.0 


274.94 ±0.54 


275.16 ±0.54 


0.78 ±0.69 


2.5 


190.84 ±0.47 


190.96 ±0.47 


0.88 ±0.72 


3.0 


129.42 ±0.42 


129.50 ±0.41 


0.72 ±0.57 


3.5 


84.95 ±0.32 


85.02 ±0.32 


0.76 ±0.62 


4.0 


54.58 ±0.25 


54.54 ±0.25 


0.75 ±0.60 


5.0 


25.69 ±0.19 


25.87 ±0.18 


0.89 ± 1.02 


6.0 


18.86 ±0.18 


19.26 ±0.17 


1.84 ± 1.69 



TABLE I. Comparison of the average energy obtained with the two- and three-valued algorithms on the same set of instances. 
The average connectivity of vertices in L is c. The last column gives the average value and standard deviation of the single- 
sample absolute value of the difference between the two energies. We see that this difference is smaller than 1% for c < 3.5, 
smaller than 4% for c = 4.0 and c — 5.0, and becomes relatively large for c — 6.0, when the energy itself is very small. 

As expected, the average energy is exactly the same for c < e. In fact, in runs without reinforcement (in which the 
messages converge to finite values) , we also verified that for c < e almost all the elements of the three- valued messages 
that represent states with P^^r 7^ or Pr^^ 7^ ^ equal to minus infinity, which means that the corresponding 
cavity marginals are concentrated on the states described by the two-valued fixed points. In fact, for c = 2.0 the 
average number of message elements corresponding to Pi^^r 7^ or Pr.^l ^^"^ with finite values is 0.45 ± 0.01 
(out of 45 matrix elements), the average value of these finite fields being —42.2 ± 1.0, while for c — 2.5 these averages 
are respectively 1.85 ± 0.03 and -14.0 ± 0.3. 

What is more surprising is that the two-valued algorithm gives results that are almost identical to the three-valued 
one also for c > e. Even in this case we have verified that a small number of message elements corresponding to 
Pl^r 7^ or Pr^i^ 7^ are different from minus infinity: their average number is between 3.5 and 5.4 (depending 
on c, and out of 45 matrix elements), and their average value is between —6.8 ± 0.2 and —12.9 ± 0.4. This means 
that also for c > e the deviation from a two-valued distribution is small, and helps to explain why the two-valued 
algorithm has such a good performance. However, for c > e some instances do not converge without reinforcement, 
so this conclusion is only limited to those instances for which convergence is obtained even without reinforcement 
(between 37% and 99% of the instances, depending on c). 

It is not clear to us why the three-valued solution displays these features, and in particular whether this is a sign 
that RSB does not occur in this ensemble of instances. Anyhow, since the energy obtained with the two-valued 
algorithm is so good, and its running time is much shorter than for the three- valued one, we have used the two- valued 
algorithm for all the other tests, both for small and large connectivities. 



2. Comparison with the greedy heuristic 



We consider the following greedy heuristic. Given an instance of the problem, specified by the graph G = {L, R; E) 
with L partitioned in Li and L2 and by the probabilities p = {pi^, h G L2}, we assign the first stage variables Xi by 
solving the maximum- weight matching problem with graph G and with weights Wi on the vertices i G LU R equal to 
1 for the vertices li g Li and r € R, and equal to the probabilities p;, for vertices I2 G ^2- To keep the notation as 
similar as possible to the previous one, we can state the problem as follows: 



greedy _ 



arg mm mm • 



.r£dh 



Xl^r = 



E 

i2eL2 



Ph 



El 

reR. 



.ledr 



Xlr = 



(172) 



subject to the matching constraints (70). Once xf'°'^'^^ is found, we compute the average energy as in the previous 
paragraph by sampling over 300 realizations of the stochastic parameters t and finding the optimal Xj corresponding 
to each realization, and then averaging the energy. As a lower bound to the optimal energy, we also consider the 
offline optimum obtained with full prior knowledge of t. 
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FIG. 5. Average energy of the two-valued algorithm vs. the greedy heuristic and the offline optimum. The error bars are 
smaller than the symbols. 



We have compared the results of the two- valued algorithm with the greedy heuristic and the offline optimum for an 
ensemble of instances with \Li \ — 1000 and IL2I = \R\ = 2 000. As before, the value of the reinforcement parameter 
is p = 0.001 and the value of the noise parameter is 77 = 0.001, but the number of bins is increased to B = 30 to 
improve the numerical accuracy (we verified that while there is a small improvement of the energy going from _B = 10 
to -B = 30, there is almost no further improvement going to -B = 100). The number of iterations with constant xi 
required as a convergence criterion is / = 300. Each data point is an average of 50 to 100 instances (depending on c). 
The total running time of the two-valued algorithm with these parameters on this ensemble of instances is typically 
less than 1 minute. The results of these simulations are shown in Figure [5] 

We find that the two-valued algorithm always succeeds in finding a solution with a smaller energy than the greedy 
heuristic. For c < e the difference between the two is very small, and the greedy heuristic is in fact very close to 
the lower bound for the optimum obtained from the offline solution. As c approaches e the gap between the greedy 
heuristic and the two-valued algorithm increases, and it becomes larger than 50% of the energy of the two-valued 
algorithm for c > 4. 



3. Comparison with Stochastic Programmmg 

Having verified that a simple greedy heuristic fails to provide close-to-optimal solutions for the problem, we have 
compared the performance of the two-valued algorithm with the standard technique in the field: stochastic pro- 
gramming. This technique consists in extracting S realizations {t^,...,t'^} of the stochastic parameters from the 
distribution p{t) and then observing that 

1 ^ 

min \ p(t) min£(xi, t, X2) ~ min — \ min£(xi,t'*,X2) (173) 

xi ^ — ' X2 xi S ^ — ' xS 

t s=l ^ 



1 

- inin ^^f(xl,t^x5) (174) 



5x, 

and the last problem is a standard offline optimization problem that can be solved using OR techniques like linear 
relaxations complemented with branch-and-bound. This approach suffers generally from two separate drawbacks: one 



is the approximation in (173) and the second is that the minimization problem in (174) is NP-Complete [5]. 

We employed two well known tools for this task: iLog CPLEX, a commercial, industrial-strenght linear/integer 
programming software from IBM, and lp_solve, an open source alternative. Although qualitatively similar, results 
with lp_solve were uniformly worse than the ones of CPLEX, so we will not report them. 
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FIG. 6. Average energy of the solution obtained with stochastic programming as a function of the number of samples 5 for a 
single instance with c = 2.5 and Li = 1000, IL2I — \R\ — 2 000. For each value of 5, the energy of the solution obtained for 
xi has been computed by resampling over 10 000 realizations of t, finding the optimal X2 corresponding to each realization, 
and averaging the corresponding energies. The horizontal lines correspond to the average energy obtained with the two- valued 
algorithm and its error bar. 
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FIG. 7. CPLEX running times (in seconds) as a function of the number of samples for the same instance of Figure [6] As a 
comparison, the running time of the two-valued algorithm is of 60 seconds on the same computer for this instance. 



We observe that the results obtained with stochastic programming depend strongly on S and on the average degree 
c. As expected, for fixed c the quality of the solution improves as S increases (Figure|6|, but the running time becomes 
larger (Figure [?]). 

For c < e, CPLEX seems to be able to solve the problem in polynomial time in both S and N, but either it is 
much slower than the SP-derived algorithm or it gives a significantly higher energy (depending on S). For c > e, the 
time scaling of CPLEX worsens significantly: for S = 10, the running time increases dramatically with \Li\ (Figure 
[s]), and for \Li\ = 1 000 CPLEX was not able to attain an optimum under a cutoff of 24 hours even for S = 2. 
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FIG. 8. CPLEX running times (in seconds) as a function of |Lij for 5 = 10 and four different values of c. There seems to be 
some transition in this behaviour loosely around c ~ e. 



VI. CONCLUSION 



We discussed the technical details which arise in the generalization of the message passing algorithm introduced in 
Ref. |18| . In particular, we applied the general scheme to the stochastic maximum weight independent set problem and 
to stochastic matching problems. Extensive numerical comparisons with local search algorithm based on sampling, 
linear programming methods and greedy algorithms corroborate the idea that the message-passing approach is a 
valuable alternative to such the traditional techniques. As a concluding remark we should mention that the method 
described in this work is in fact not limited to stochastic optimization problems. There are lots of relevant problems 
in which one is interested in optimizing a cost function that is hard to compute, for example an entropy function or 
a free energy. Our approach to stochastic optimization problems could also be adapted to study these issues. 
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